AITopics | output value

Collaborating Authors

output value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CODECRASH: Exposing LLMFragility to Misleading Natural Language in Code Reasoning

Neural Information Processing SystemsJun-21-2026, 20:21:24 GMT

Large Language Models (LLMs) have recently demonstrated strong capabilities in code-related tasks, but their robustness in code reasoning under perturbations remains underexplored. We introduce CODECRASH, a stress-testing framework with 1,279 questions from CRUXEVAL and LIVECODEBENCH, designed to evaluate reasoning reliability under structural perturbations and misleading natural language (NL) contexts. Through a systematic evaluation of 17 LLMs, we find that models often shortcut reasoning by over-relying on NL cues, leading to an average performance degradation of 23.2% in output prediction tasks. Even with Chain-of-Thought reasoning, models on average still have a 13.8%drop due to distractibility and rationalization, revealing a lack of critical reasoning capability to distinguish the actual code behaviors. While Large Reasoning Models with internal reasoning mechanisms improve robustness by fostering critical thinking, plausible yet incorrect hints can trigger pathological self-reflection, causing 2 3 times token consumption and even catastrophic cognitive dissonance in extreme cases for QwQ-32B. We refer to this phenomenon as Reasoning Collapse. CODECRASH provides a rigorous benchmark for evaluating robustness in code reasoning, guiding future research and development toward more reliable and resilient models.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia (0.67)
North America (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

995f693b73050f90977ed2828202645c-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 01:59:03 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CodeCrash: Exposing LLM Fragility to Misleading Natural Language in Code Reasoning

Lam, Man Ho, Wang, Chaozheng, Huang, Jen-tse, Lyu, Michael R.

arXiv.org Artificial IntelligenceOct-14-2025

Large Language Models (LLMs) have recently demonstrated strong capabilities in code-related tasks, but their robustness in code reasoning under perturbations remains underexplored. We introduce CodeCrash, a stress-testing framework with 1,279 questions from CruxEval and LiveCodeBench, designed to evaluate reasoning reliability under structural perturbations and misleading natural language (NL) contexts. Through a systematic evaluation of 17 LLMs, we find that models often shortcut reasoning by over-relying on NL cues, leading to an average performance degradation of 23.2% in output prediction tasks. Even with Chain-of-Thought reasoning, models on average still have a 13.8% drop due to distractibility and rationalization, revealing a lack of critical reasoning capability to distinguish the actual code behaviors. While Large Reasoning Models with internal reasoning mechanisms improve robustness by fostering critical thinking, plausible yet incorrect hints can trigger pathological self-reflection, causing 2-3 times token consumption and even catastrophic cognitive dissonance in extreme cases for QwQ-32B. We refer to this phenomenon as Reasoning Collapse. CodeCrash provides a rigorous benchmark for evaluating robustness in code reasoning, guiding future research and development toward more reliable and resilient models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.14119

Country: Asia (0.67)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

A Method details 476 A.1 Categorical attention

Neural Information Processing SystemsOct-9-2025, 02:21:08 GMT

As described in Section 3.2, we implement categorical attention by associating each attention head In this example, an attention head ( left) calculates the histogram for each position. This allows us to compress the corresponding function. Illustrative programs are depicted in Figures 8 and 9 . This is illustrated in Figure 9 . In this section we describe additional implementation details for the experiments in Section 4 .W e We train each model for 250 epochs with a batch size of 512, a learning rate of 0.05, and We take one Gumbel sample per step.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

H-AddiVortes: Heteroscedastic (Bayesian) Additive Voronoi Tessellations

Stone, Adam J., Gosling, John Paul

arXiv.org Machine LearningMar-17-2025

This paper introduces the Heteroscedastic AddiVortes model, a Bayesian non-parametric regression framework that simultaneously models the conditional mean and variance of a response variable using adaptive Voronoi tessellations. By employing a sum-of-tessellations approach for the mean and a product-of-tessellations approach for the variance, the model provides a flexible and interpretable means to capture complex, predictor-dependent relationships and heteroscedastic patterns in data. This dual-layer representation enables precise inference, even in high-dimensional settings, while maintaining computational feasibility through efficient Markov Chain Monte Carlo (MCMC) sampling and conjugate prior structures. We illustrate the model's capability through both simulated and real-world datasets, demonstrating its ability to capture nuanced variance structures, provide reliable predictive uncertainty quantification, and highlight key predictors influencing both the mean response and its variability. Empirical results show that the Heteroscedastic AddiVortes model offers a substantial improvement in capturing distributional properties compared to both homoscedastic and heteroscedastic alternatives, making it a robust tool for complex regression problems in various applied settings.

addivorte model, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2503.13037

Country:

North America > United States (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Probabilistic Neural Networks (PNNs) with t-Distributed Outputs: Adaptive Prediction Intervals Beyond Gaussian Assumptions

Pourkamali-Anaraki, Farhad

arXiv.org Machine LearningMar-16-2025

Traditional neural network regression models provide only point estimates, failing to capture predictive uncertainty. Probabilistic neural networks (PNNs) address this limitation by producing output distributions, enabling the construction of prediction intervals. However, the common assumption of Gaussian output distributions often results in overly wide intervals, particularly in the presence of outliers or deviations from normality. To enhance the adaptability of PNNs, we propose t-Distributed Neural Networks (TDistNNs), which generate t-distributed outputs, parameterized by location, scale, and degrees of freedom. The degrees of freedom parameter allows TDistNNs to model heavy-tailed predictive distributions, improving robustness to non-Gaussian data and enabling more adaptive uncertainty quantification. We develop a novel loss function tailored for the t-distribution and derive efficient gradient computations for seamless integration into deep learning frameworks. Empirical evaluations on synthetic and real-world data demonstrate that TDistNNs improve the balance between coverage and interval width. Notably, for identical architectures, TDistNNs consistently produce narrower prediction intervals than Gaussian-based PNNs while maintaining proper coverage. This work contributes a flexible framework for uncertainty estimation in neural networks tasked with regression, particularly suited to settings involving complex output distributions.

artificial intelligence, machine learning, prediction interval, (18 more...)

arXiv.org Machine Learning

2503.12354

Country: North America > United States > Colorado > Denver County > Denver (0.14)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Decoding specialised feature neurons in LLMs with the final projection layer

Davies, Harry J

arXiv.org Artificial IntelligenceJan-5-2025

Large Language Models (LLMs) typically have billions of parameters and are thus often difficult to interpret in their operation. Such black-box models can pose a significant risk to safety when trusted to make important decisions. The lack of interpretability of LLMs is more related to their sheer size, rather than the complexity of their individual components. The TARS method for knowledge removal (Davies et al 2024) provides strong evidence for the hypothesis that that linear layer weights which act directly on the residual stream may have high correlation with different concepts encoded in the residual stream. Building upon this, we attempt to decode neuron weights directly into token probabilities through the final projection layer of the model (the LM-head). Firstly, we show that with Llama 3.1 8B we can utilise the LM-head to decode specialised feature neurons that respond strongly to certain concepts, with examples such as "dog" and "California". This is then confirmed by demonstrating that these neurons can be clamped to affect the probability of the concept in the output. This extends to the fine-tuned assistant Llama 3.1 8B instruct model, where we find that over 75% of neurons in the up-projection layers have the same top associated token compared to the pretrained model. Finally, we demonstrate that clamping the "dog" neuron leads the instruct model to always discuss dogs when asked about its favourite animal. Through our method, it is possible to map the entirety of Llama 3.1 8B's up-projection neurons in less than 15 minutes with no parallelization.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.02688

Country: North America > United States > California (0.29)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Model-diff: A Tool for Comparative Study of Language Models in the Input Space

Liu, Weitang, Li, Yuelei, Li, Ying Wai, Wang, Zihan, Shang, Jingbo

arXiv.org Artificial IntelligenceDec-12-2024

Comparing two (large) language models (LMs) side-by-side and pinpointing their prediction similarities and differences on the same set of inputs are crucial in many real-world scenarios, e.g., one can test if a licensed model was potentially plagiarized by another. Traditional analysis compares the LMs' outputs on some benchmark datasets, which only cover a limited number of inputs of designed perspectives for the intended applications. The benchmark datasets cannot prepare data to cover the test cases from unforeseen perspectives which can help us understand differences between models unbiasedly. In this paper, we propose a new model comparative analysis setting that considers a large input space where brute-force enumeration would be infeasible. The input space can be simply defined as all token sequences that a LM would produce low perplexity on -- we follow this definition in the paper as it would produce the most human-understandable inputs. We propose a novel framework \our that uses text generation by sampling and deweights the histogram of sampling statistics to estimate prediction differences between two LMs in this input space efficiently and unbiasedly. Our method achieves this by drawing and counting the inputs at each prediction difference value in negative log-likelihood. Experiments reveal for the first time the quantitative prediction differences between LMs in a large input space, potentially facilitating the model analysis for applications such as model plagiarism.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.12177

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Emulating the Global Change Analysis Model with Deep Learning

Holmes, Andrew, Jensen, Matt, Coffland, Sarah, Shen, Hidemi Mitani, Sizemore, Logan, Bassetti, Seth, Nieva, Brenna, Tebaldi, Claudia, Snyder, Abigail, Hutchinson, Brian

arXiv.org Artificial IntelligenceDec-11-2024

The Global Change Analysis Model (GCAM) simulates complex interactions between the coupled Earth and human systems, providing valuable insights into the co-evolution of land, water, and energy sectors under different future scenarios. Understanding the sensitivities and drivers of this multisectoral system can lead to more robust understanding of the different pathways to particular outcomes. The interactions and complexity of the coupled human-Earth systems make GCAM simulations costly to run at scale - a requirement for large ensemble experiments which explore uncertainty in model parameters and outputs. A differentiable emulator with similar predictive power, but greater efficiency, could provide novel scenario discovery and analysis of GCAM and its outputs, requiring fewer runs of GCAM. As a first use case, we train a neural network on an existing large ensemble that explores a range of GCAM inputs related to different relative contributions of energy production sources, with a focus on wind and solar. We complement this existing ensemble with interpolated input values and a wider selection of outputs, predicting 22,528 GCAM outputs across time, sectors, and regions. We report a median $R^2$ score of 0.998 for the emulator's predictions and an $R^2$ score of 0.812 for its input-output sensitivity.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.0885

Country:

North America > United States (1.00)
Asia (1.00)
Africa (0.69)

Genre: Research Report (0.82)

Industry:

Energy > Oil & Gas (0.94)
Energy > Renewable > Solar (0.69)
Energy > Renewable > Wind (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Detection of Electric Motor Damage Through Analysis of Sound Signals Using Bayesian Neural Networks

Bauer, Waldemar, Zagorowska, Marta, Baranowski, Jerzy

arXiv.org Artificial IntelligenceSep-12-2024

Fault monitoring and diagnostics are important to ensure reliability of electric motors. Efficient algorithms for fault detection improve reliability, yet development of cost-effective and reliable classifiers for diagnostics of equipment is challenging, in particular due to unavailability of well-balanced datasets, with signals from properly functioning equipment and those from faulty equipment. Thus, we propose to use a Bayesian neural network to detect and classify faults in electric motors, given its efficacy with imbalanced training data. The performance of the proposed network is demonstrated on real life signals, and a robustness analysis of the proposed solution is provided.

bayesian neural network, fault detection, neural network, (10 more...)

arXiv.org Artificial Intelligence

2409.08309

Country:

Europe > Poland > Lesser Poland Province > Kraków (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
Asia > China (0.04)

Genre: Research Report (0.83)

Industry: Electrical Industrial Apparatus (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback